25 research outputs found
Hyde: The First Open-Source, Python-Based, Gpu-Accelerated Hyperspectral Denoising Package
As with any physical instrument, hyperspectral cameras induce different kinds
of noise in the acquired data. Therefore, Hyperspectral denoising is a crucial
step for analyzing hyperspectral images (HSIs). Conventional computational
methods rarely use GPUs to improve efficiency and are not fully open-source.
Alternatively, deep learning-based methods are often open-source and use GPUs,
but their training and utilization for real-world applications remain
non-trivial for many researchers. Consequently, we propose HyDe: the first
open-source, GPU-accelerated Python-based, hyperspectral image denoising
toolbox, which aims to provide a large set of methods with an easy-to-use
environment. HyDe includes a variety of methods ranging from low-rank
wavelet-based methods to deep neural network (DNN) models. HyDe's interface
dramatically improves the interoperability of these methods and the performance
of the underlying functions. In fact, these methods maintain similar HSI
denoising performance to their original implementations while consuming nearly
ten times less energy. Furthermore, we present a method for training DNNs for
denoising HSIs which are not spatially related to the training dataset, i.e.,
training on ground-level HSIs for denoising HSIs with other perspectives
including airborne, drone-borne, and space-borne. To utilize the trained DNNs,
we show a sliding window method to effectively denoise HSIs which would
otherwise require more than 40 GB. The package can be found at:
\url{https://github.com/Helmholtz-AI-Energy/HyDe}.Comment: 5 page
Feed-Forward Optimization With Delayed Feedback for Neural Networks
Backpropagation has long been criticized for being biologically implausible,
relying on concepts that are not viable in natural learning processes. This
paper proposes an alternative approach to solve two core issues, i.e., weight
transport and update locking, for biological plausibility and computational
efficiency. We introduce Feed-Forward with delayed Feedback (F), which
improves upon prior work by utilizing delayed error information as a
sample-wise scaling factor to approximate gradients more accurately. We find
that F reduces the gap in predictive performance between biologically
plausible training algorithms and backpropagation by up to 96%. This
demonstrates the applicability of biologically plausible training and opens up
promising new avenues for low-energy training and parallelization
Dynamic particle swarm optimization of biomolecular simulation parameters with flexible objective functions
Molecular simulations are a powerful tool to complement and interpret ambiguous experimental data on biomolecules to obtain structural models. Such data-assisted simulations often rely on parameters, the choice of which is highly non-trivial and crucial to performance. The key challenge is weighting experimental information with respect to the underlying physical model. We introduce FLAPS, a self-adapting variant of dynamic particle swarm optimization, to overcome this parameter selection problem. FLAPS is suited for the optimization of composite objective functions that depend on both the optimization parameters and additional, a priori unknown weighting parameters, which substantially influence the search-space topology. These weighting parameters are learned at runtime, yielding a dynamically evolving and iteratively refined search-space topology. As a practical example, we show how FLAPS can be used to find functional parameters for small-angle X-ray scattering-guided protein simulations
Accelerating Neural Network Training with Distributed Asynchronous and Selective Optimization (DASO)
With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) and large-scale distributed resources on computer clusters. Current DPNN approaches implement the network parameter updates by synchronizing and averaging gradients across all processes with blocking communication operations after each forward-backward pass. This synchronization is the central algorithmic bottleneck. We introduce the Distributed Asynchronous and Selective Optimization (DASO) method, which leverages multi-GPU compute node architectures to accelerate network training while maintaining accuracy. DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks while adjusting the global synchronization rate during the learning process. We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks, as compared to current optimized data parallel training methods
Massively Parallel Genetic Optimization through Asynchronous Propagation of Populations
We present Propulate, an evolutionary optimization algorithm and software
package for global optimization and in particular hyperparameter search. For
efficient use of HPC resources, Propulate omits the synchronization after each
generation as done in conventional genetic algorithms. Instead, it steers the
search with the complete population present at time of breeding new
individuals. We provide an MPI-based implementation of our algorithm, which
features variants of selection, mutation, crossover, and migration and is easy
to extend with custom functionality. We compare Propulate to the established
optimization tool Optuna. We find that Propulate is up to three orders of
magnitude faster without sacrificing solution accuracy, demonstrating the
efficiency and efficacy of our lazy synchronization approach. Code and
documentation are available at https://github.com/Helmholtz-AI-Energy/propulateComment: 18 pages, 5 figures submitted to ISC High Performance 202
HeAT – a Distributed and GPU-accelerated Tensor Framework for Data Analytics
In order to cope with the exponential growth in available data, the efficiency of data analysis and machine learning libraries have recently received increased attention. Although corresponding array-based numerical kernels have been significantly improved, most are limited by the resources available on a single computational node. Consequently, kernels must exploit distributed resources, e.g., distributed memory architectures. To this end, we introduce HeAT, an array-based numerical programming framework for large-scale parallel processing with an easy-to-use NumPy-like API. HeAT utilizes PyTorch as a node-local eager execution engine and distributes the workload via MPI on arbitrarily large high-performance computing systems. It provides both low-level array-based computations, as well as assorted higher-level algorithms. With HeAT, it is possible for a NumPy user to take advantage of their available resources, significantly lowering the barrier to distributed data analysis. Compared with applications written in similar frameworks, HeAT achieves speedups of up to two orders of magnitude
HeAT -- a Distributed and GPU-accelerated Tensor Framework for Data Analytics
To cope with the rapid growth in available data, the efficiency of data
analysis and machine learning libraries has recently received increased
attention. Although great advancements have been made in traditional
array-based computations, most are limited by the resources available on a
single computation node. Consequently, novel approaches must be made to exploit
distributed resources, e.g. distributed memory architectures. To this end, we
introduce HeAT, an array-based numerical programming framework for large-scale
parallel processing with an easy-to-use NumPy-like API. HeAT utilizes PyTorch
as a node-local eager execution engine and distributes the workload on
arbitrarily large high-performance computing systems via MPI. It provides both
low-level array computations, as well as assorted higher-level algorithms. With
HeAT, it is possible for a NumPy user to take full advantage of their available
resources, significantly lowering the barrier to distributed data analysis.
When compared to similar frameworks, HeAT achieves speedups of up to two orders
of magnitude.Comment: 10 pages, 8 figures, 5 listings, 1 tabl
Accelerating Neural Network Training with Distributed Asynchronous and Selective Optimization (DASO)
With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) to utilize large-scale distributed resources on computer clusters. Current DPNN approaches implement the network parameter updates by synchronizing and averaging gradients across all processes with blocking communication operations. This synchronization is the central algorithmic bottleneck. To combat this, we introduce the Distributed Asynchronous and Selective Optimization (DASO) method which leverages multi-GPU compute node architectures to accelerate network training. DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks while adjusting the global synchronization rate during the learning process. We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks, as compared to other existing data parallel training methods
RNA contact prediction by data efficient deep learning
On the path to full understanding of the structure-function relationship or even design of RNA, structure prediction would offer an intriguing complement to experimental efforts. Any deep learning on RNA structure, however, is hampered by the sparsity of labeled training data. Utilizing the limited data available, we here focus on predicting spatial adjacencies ("contact maps") as a proxy for 3D structure. Our model, BARNACLE, combines the utilization of unlabeled data through self-supervised pre-training and efficient use of the sparse labeled data through an XGBoost classifier. BARNACLE shows a considerable improvement over both the established classical baseline and a deep neural network. In order to demonstrate that our approach can be applied to tasks with similar data constraints, we show that our findings generalize to the related setting of accessible surface area prediction